Voice encoder using a voice activity detector

ABSTRACT

A voice encoder using a voice activity detector in which two predictive coefficients available from an adaptive predictor in the voice encoder are received for each sample of a input voice signal of the voice encoder. Average values of the predictive coefficients are calculated for each fixed period to decide whether the period is a voice active period or a voice non-active period as a result of comparing the average values with respective ranges of predictive coefficient threshold values predetermined from respective distributions of the two predictive coefficients. Voice active/non-active flags indicative of the voice active period and the voice non-active period are obtained for voice operate switch exchange of encoded of the voice encoder.

This is a continuation of application Ser. No. 07/907,221, filed Jul.1,1992 now abandoned.

BACKGROUND OF THE INVENTION

The present invention relates to a voice encoder using a voice activitydetector for use in a voice communication system.

Portable radio terminals, such as digital cordless telephone apparatus,employ VOX (Voice Operate Switch Exchange) control which actuates atransmitter only during voice activity and holds it out of operationduring a silent duration so as to reduce power consumption duringtransmission, and this control reduces the mean power consumption fortransmission by about 15%. To perform such a VOX function, a voiceactivity detector for detecting the presence or absence of a voicesignal needs to be provided at a stage preceding a transmitter outputcircuit.

The following will be described on the assumption that such a voiceactivity detector is applied to VOX control of a digital cordlesstelephone apparatus. The digital cordless telephone utilizes a 32 kb/sadaptive differential pulse code modulation (ADPCM) system as the voicecoding system (CODEC), and the processing delay time in this apparatusis required to be equal to or shorter than 7 msec.

Since the processing by a conventional voice activity detector describedbelow is executed for each 20 msec frame, a delay time of at least 20msec is induced, making it impossible to meet a requirement that thedelay time be 7 msec or less. Moreover, the conventional voice activitydetector is formed independently of the voice encoder, and hence isdefective in that the amount of data to be processed is inevitablylarge.

SUMMARY OF THE INVENTION

It is therefore an object of the present invention to provide a voiceencoder using a voice activity detector which permits the detection ofvoice activity or non-activity in each short period while holding thedelay time to be shorter than 7 msec, through effective utilization ofpredictive coefficients obtainable during processing by the voiceencoder having an adaptive prediction function.

In order to obtain the above object a voice encoder is provided and hastwo terminals for receiving, for each sample, the digital information ofan input voice signal. A subtractor subtracts values to produce adifference signal, for each sample. An adaptive quantizer quantizes, foreach sample, the difference signal to produce a quantized output. Thequantized output for each sample is outputted through output terminalsof the encoder. An inverse adaptive quantizer receptive of the quantizedoutput, for each sample, performs an inverse-adaptive quantizationthereof to produce a quantized difference signal. An adder adds theprediction signal and the quantized difference signal to obtain areproduced signal. An adaptive predictor produces the prediction signaland two predictive coefficients from the quantized difference signal andthe reproduced signal, for each sample.

A voice activity detector of the voice endoder receives the twopredictive coefficients applied to respective framing circuits whereinthey are framed at 5 msec intervals. The framed outputs of the framingcircuits are applied to average calculator means comprising two averagecalculators which calculate the average values of the two predictivecoefficients for each framed period of the input voice signal. Decisionmeans are provided for holding respective ranges of predictivecoefficient threshold values precalculcated from respectivedistributions of the two predictive coefficients and for decidingwhether each framed period is a voice active period or a voicenon-active period as a result of comparing the average values with therespective ranges of predictive coefficient threshold values to obtainvoice active/non-active flags in correspondence to the voice activeperiod and the voice non-active period for voice operate switch exchangeof quantized output.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be described in detail below in comparisonwith prior art with reference to accompanying drawings; in which:

FIG. 1 is a block diagram of the voice activity detector employed in thepresent invention;

FIG. 2 illustrates timing charts explanatory of the operation of thevoice activity detector employed in the present invention;

FIG. 3 is a block diagram of an ADPCM encoder using a voice activitydetector of the present invention;

FIG. 4 shows the distributions of predictive coefficients a₁ and a₂ ;

FIG. 5 shows the distributions of the predictive coefficients a₁ and a₂;

FIG. 6 is a block diagram of a conventional voice activity detector and

FIG. 7 is a conventional decision logic flowchart.

DETAILED DESCRIPTION

To make differences between prior art and the present invention clear,an example of prior art will first be described.

FIG. 6 is a block diagram showing a conventional voice activitydetector, which divides an input voice signal a, sampled at a samplingrate of 8 kHz and quantized by the use of 256 quantization levels, inunits of 20 msec frames (each 160 samples), decides the voice activityor non-activity for each frame and outputs a voice activity/non-activityflag. The voice input signal a is applied to a direct-current suppressor11, in which its DC component is removed by a high-pass filter and theoutput signal b is provided to each circuit mentioned below.

In a high level power detector 12 the 20 msec voice period is subdividedinto five subframes (32 samples) of 4 msec and, for each sub-frame, ashort-period power P_(sk) is computed by the following Eq. (1): ##EQU1##where X_(i) is the filter output and a notation is the subframe number.

For the power P_(sk) thus computed for each subframe, the followingpower detection is conducted using a power threshold value Th2 (-30dBm0).

    When P.sub.sk ≧Th2, D.sub.2k =1                     (2)

    When P.sub.sk <Th2, D.sub.2k =0                            (3)

Further, a weighted sum total D₂ of the following Eq. (4) is obtained,which sum total is regarded as the result of detection for one frame,and a signal c is output accordingly. ##EQU2##

In a low level power detector 13, for the short-period power calculatedby Eq. (1), the following power detection is conducted using a powerthreshold value Th1 (50 dBm0).

    When P.sub.sk ≧Th1, D.sub.lk =1                     (5)

    When P.sub.sk <Th1, D.sub.lk =0                            (6)

Similarly, the following weighted sum total D₁ is obtained, which isregarded as the result of detection for one frame, and a signal isoutput accordingly. ##EQU3## At the same time, the value of thefollowing equation is calculated. ##EQU4##

In a zero crossing number detector 14, Z_(sk) is calculated by thefollowing Eq. (9) for each subframe so as to count the zero crossingnumber of the signal (the number of different sign bits of voice signalsof two successive samples). ##EQU5##

For each Z_(sk) thus computed, the zero crossing number is detectedusing a zero crossing threshold value Th3 (24) as follows:

    When Z.sub.sk ≧Th3, DZ.sub.sk =1                    (10)

    When Z.sub.sk <Th3, DZ.sub.sk =0                           (11)

Likewise, the following weighted sum total D_(z) is calculated and asignal e is output as indicative of the result of detection for oneframe. ##EQU6## In an inter-frame power-increment comparator 15 thepower P_(Tn) of one frame is obtained by the following Eq. (13):##EQU7## Further, the power thus obtained is compared with theinter-frame power P_(T)(n-1) Of the preceding frame to detect the nextpower increment D₄, and its result is output as a signal f.

    When P.sub.Tn ≧4P.sub.T(n-I), D.sub.4 =1            (14)

    When P.sub.Tn <4P.sub.T(n-1), D.sub.4 =0                   (15)

A decision circuit 16 receives the signals c, d, e and f and outputs avoice active/non-active flag indicating the result of detection of thevoice activity in accordance with a decision logic flow depicted in FIG.7. In FIG. 7, HOT means a hang-over timer (a function by which when thedecision changes from the voice activity to the voice non-activity, thesubsequent several frames are set voice-active to prevent the voiceactivity from ending), and SP flag means a voice active/non-active flag.

[EMBODIMENT]

The present invention will hereinafter be described as being applied toa 32 kb/s (kilobit/sec) ADPCM voice encoder for the digital cordlesstelephone.

FIG. 3 is a block diagram of the ADPCM voice encoder using a voiceactivity detector according to present invention, and FIG. 1 is a blockdiagram illustrating an embodiment of the voice activity detectoremployed in the present invention.

A description will be given first of the ADPCM encoder depicted in FIG.3. Reference numeral 21 indicates a uniform PCM converter whereby a 64kb/s μ-rule PCM input signal is converted, for each sample, a linear13-bit signal. Reference numeral 22 denotes a subtractor whereby apredition signal j, which is the output from an adaptive predictor 23,is subtracted from the output of the uniform PCM converter 21 to obtaina difference signal g. The difference signal g is quantized by anadaptive quantizer 24 and voice data of 32 kb/s are provided as theoutput of the ADPCM voice encoder on the transmission line.

On the other hand, an inverse adaptive quantizer 26 performs inverseadaptive quantization of the 32 kb/s voice data to obtain a quantizeddifference signal m. An adder 25 adds the quantized difference signal mand the prediction signal j to obtain a reproduced signal n.

The adaptive predictor 23 produces, for each sample, the predictionsignal j by the use of predictive coefficients a_(i) (i=1, 2) and b_(i)(i=1, . . 6) under the principle defined by the following equations (16)and (17). ##EQU8## Where Se(h): prediction signal j

Sr(h-i): reproduced signal n

d_(q) : quantized difference signal m

h: instant sampling point

The predictive coefficients al (i=1,2) and b_(i) (i=1, . . . . 6 aresuccessively renewed in the adaptive predictor 23 under a simplifiedprocess of the gradient projection method.

The predictive coefficients a_(i) (i=1,2) and b_(i) (i=1, . . . . 6)have spectrum-envelope information of an input signal, and their valuesare differently distributed with a case of a voice signal of highauto-correlation and a case of background noise of low auto-correlation.Accordingly, an instantaneous state of an input signal can be decidedfor each framed period as a voice signal or background noise inaccordance with the values of the predictive coefficients a_(i) andb_(i). In the present invention, only one kind of coefficients a_(i)(i=1,2) except predictive coefficients b_(i) is employed for detectingvoice activity and applied to the voice detector 27.

To prove the above, examples of measured distributions of two predictivecoefficients a₁ and a₂ are shown in FIGS. 4(A), 4(B) and FIGS, 5(A),(B). FIG. 4(A) shows voice signals (male voices), 4(B) voice signals(female voices), FIG. 5(A) white noise and 5(B) filtered noise (-6dB/oct).

In FIGS. 4 and 5 the ranges of the two predictive coefficients a₁ and a₂indicated by respective sample points, i.e. white, black and doublecircles, are each more than -0.05 and less than -0.05, with respect toeach sample point as the origin. The sample point of the maximumfrequency of generation is indicated by the double circle, and thesample point which takes a value greater than 0.1 when it is normalizedby the maximum frequency of generation is indicated by the black circle.

From FIGS. 4 and 5 it is understood that the voice active period and thebackground noise period (i.e. the voice non-active period) can bedecided using proper threshold values for the predictive coefficients a₁and a₂. When the predictive coefficients a₁ and a₂ assume values in theranges (1) to (5) shown below, the voice activity detector 27 decidesthat such periods are background noise periods, on the basis of thedistribution diagrams of the predictive coefficients depicted in FIGS. 4and 5, and when the coefficients assume other values, such periods aredecided to be voice active periods. Thus the voice activity detectoroutputs a voice detection flag indicated by the L or H levelaccordingly.

(1) (0.70≦a₁ ≦1.00) and (-0.45<a₂ ≦-0.35)

(2) (0.75≦a₁ ≦1.10) and (-0.55<a₂ ≦-0.45)

(3) (0.85≦a₁ ≦1.20) and (-0.65<a₂ ≦-0.55)

(4) (0.95≦a₁ ≦1.20) and (-0.70<a₂ -0.65)

(5) (a₁ ≦0.75) and (a₂ ≦0)

FIG. 1 is a block diagram illustrating an example of the construction ofthe voice activity detector employed in the present invention. Thecontents of processing of each block in FIG. 1 will be described. Thepredictive coefficients a₁ and a₂ are input into framing circuits 31 and32, respectively, wherein they are framed at 5 msec intervals, and theframed outputs are applied to average calculators 33 and 34. The averagecalculators 33 and 34 each calculate the average value of the predictivecoefficient for one frame and apply the calculated output to a voiceactive/non-active detector 35. The detector 35 sets the voice detectionflag to the state of voice-non-active (L) or voice-active (H), dependingon whether or not the average values of the predictive coefficients a₁and a₂ fall inside the ranges of the threshold values (1) to (5)referred to above. The output of the detector 35 is provided to ahang-over processor 36, wherein it is subjected to hand-over processingof 100 msec to obtain an ultimate voice detected output.

FIG. 2 shows timing charts illustrating the results of confirmation ofthe voice activity detecting operation by computer simulation. The inputsignal was superimposed on filtered noise (-6 dB/oct). FIG. 2(A) showsthe input signal and 2(B) the results of voice active/non-activedecision after the hang-over processing. From the results shown it isseen that the system of the present invention is not likely tomalfunction in response to background noise and provides good results.FIGS. 2(C) and (D) show temporal changes of the predictive coefficientsa₁ and a₂, respectively. From FIGS. 2(C) and (D) it can be confirmedthat the predictive coefficients a₁ and a₂ assume different values forthe voice active period and the background noise period.

As described above in detail, according to the present invention, theprocessing time necessary for the detection of voice activity is reducedto about 5 msec and the voice activity detector employed in the presentinvention can be implemented with a small amount of hardware (the amountof data processing being 15% that in the ADPCM system) because ofefficient utilization of coefficients obtainable in the ADPCMprocessing. Hence the present invention is of great utility in practicaluse.

What I claim is:
 1. A voice encoder comprising:input terminal means forreceiving, for each sample, digital information of sampled values of aninput voice signal; a subtractor for subtracting, for each sample, aprediction signal from the digital information of the sampled values toproduce a difference signal; an adaptive quantizer for quantizing, foreach sample, the difference signal to produce a quantized output; outputterminal means for outputting, for each sample, the quantized output; aninverse adaptive quantizer for performing inverse-adaptive quantization,for each sample, of the quantized output to produce a quantizeddifference signal; an adder for adding, for each sample, the predictionsignal and the quantized difference signal to obtain a reproducedsignal; an adaptive predictor for producing, for each sample, theprediction signal and two predictive coefficients from the quantizeddifference signals and the reproduced signal; average calculator meansfor producing respective average values of the two predictivecoefficients produced in the adaptive predictor for each framed periodof the input voice signal; and decision means for holding respectiveranges of predictive coefficient threshold values precalculated fromrespective distributions of the two predictive coefficients and fordeciding whether said each framed period is a voice active period or avoice non-active period as a result of comparing the average valuesprovided from said average calculator means with said respective rangesof predictive coefficient threshold values to obtain voiceactive/non-active flags in correspondence to said voice active periodand said voice non-active period for voice operate switch exchange ofthe quantized output.
 2. A voice encoder according to claim 1, in whichsaid respective ranges of predictive coefficient threshold values areprecalculated to be greater than -0.05 and smaller than ±0.05 withrespect to each sample.